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Abstract 

We show that learning a convex body in W^, given random samples 

from the body, requires 2^*^'^/^^ samples. By learning a convex body 
we mean finding a set having at most e relative symmetric difference 
with the input body. To prove the lower bound we construct a hard to 
learn family of convex bodies. Our construction of this family is very 
simple and based on error correcting codes. 

1 Introduction 

We consider the following problem: Given uniformly random points from a 
convex body in MJ^, we would like to approximately learn the body with as 
few samples as possible. In this question, and throughout this paper, we are 
interested in the number of samples but not in the computational require- 
ments for constructing such an approximation. Our main result will show 
that this needs about 2^^^^ samples. This problem is a special case of the 
statistical problem of inferring information about a probability distribution 
from samples. For example, one can approximate the centroid of the body 
with a sample of size roughly linear in d. On the other hand, a sample of 
size polynomial in d is not enough to approximate the volume of a convex 
body within a constant factor (^, and see Section [5] here for a discussion). 
Note that known approximation algorithms for the volume (e.g., [2]) do not 
work in this setting as they need a membership oracle and random points 
from various carefully chosen subsets of the input body. 

Our problem also relates to work in learning theory (e.g., |12l [8]). where 
one is given samples generated according to (say) the Gaussian distribution 
and each sample is labeled "positive" or "negative" depending on whether 
it belongs to the body. Aside from different distributions, another difference 
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between the learning setting of [8j and ours is that in ours one gets only pos- 
itive examples. Klivans et al. [8] give an algorithm and a nearly matching 
lower bound for learning convex bodies with labeled samples chosen accord- 
ing to the Gaussian distribution. Their algorithm takes time 2*^^^) and 
they also show a lower bound of 2^^^\ 

The problem of learning convex sets from uniformly random samples 
from them was raised by Frieze et al. [4]. They gave a polynomial time 
algorithm for learning parallelopipeds. Another somewhat related direction 
is the work on the learnability of discrete distributions by Kearns et al. [7]. 

Our lower bound result (like that of [8j) also allows for membership oracle 
queries. Note that it is known that estimating the volume of convex bodies 
requires an exponential number of membership queries if the algorithm is 
deterministic [Ij, which implies that learning bodies requires an exponential 
number of membership queries because if an algorithm can learn the body 
then it can also estimate its volume. 

To formally define the notion of learning we need to specify a distance 

•) between bodies. A natural choice in our setting is to consider the total 
variation distance of the uniform distribution on each body (see Section [2]). 

We will use the term random oracle of a convex body K for a black box 
that when queried outputs a uniformly random point from K. 

Theorem 1. There exists a distribution D on the set of convex bodies in 
M'^ satisfying the following: Let ALG be a randomized algorithm that, given 
a random convex body K according to T>, makes at most q total queries to 
random and membership oracles of K and outputs a set C such that, for 
8/d<e< 1/8, 

Fi{d{C,K) < e) > 1/2 

where the probability is over K, the random sample and any randomization 
by ALG. Then 

q > 2^(\^). 

Remarkably, the lower bound of Klivans et al. [8] is numerically essen- 
tially identical to ours (2^(^) for e constant). Constructions similar to 
theirs are possible for our particular scenario [5]. We believe that our ar- 
gument is considerably simpler, and elementary compared to that of [8]. 
Furthermore, our construction of the hard to learn family is explicit. Our 
construction makes use of error correcting codes. To our knowledge, this 
connection with error correcting codes is new in such contexts and may find 
further applications. See Section [5] for some further comparison. 
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An informal outline of the proof. The idea of the proof is to find a 
large family of convex bodies in M"' satisfying two conflicting goals: (1) Any 
two bodies in the family are almost disjoint; (2) and yet they look alike 
in the sense that a small sample of random points from any such body is 
insufficient for determining which one it is. Since any two bodies are almost 
disjoint, even approximating a body would allow one to determine it exactly. 
This will imply that it is also hard to approximate. 

We first construct a family of bodies that although not almost disjoint, 
have sufficiently large symmetric difference. We will then be able to con- 
struct a family with almost disjoint bodies by taking products of bodies in 
the first family. 

The first family is quite natural (it is described formally in Sec. 13. ip . 
Consider the cross polytope On in M" (generalization of the octahedron to 
n dimensions: convex hull of the vectors {zbcj : i G [n]}, where is the unit 
vector in M" with the ith coordinate 1 and the rest 0). A peak attached 
to a facet F of On is a pyramid that has F as its base and has its other 
vertex outside 0„ on the normal to F going through its centroid. If the 
height of the peak is sufficiently small then attaching peaks to any subset 
of the 2" facets will result in a convex polytope. We will show later that 
we can choose the height so that the volume of all the 2" peaks is i}(l/n) 
fraction of the volume of 0„. We call this family of bodies V. [We remark 
that our construction of cross-polytopes with peaks has resemblance to a 
construction in [10] with different parameters, but there does not seem to 
be any connection between the problem studied there and the problem we 
are interested in.] 

Intuitively, a random point in a body from this family tells one that if 
the point is in one of the peaks then that peak is present, otherwise one 
learns nothing. Therefore if the number of queries is at most a polynomial 
in n, then one learns nothing about most of the peaks and so the algorithm 
cannot tell which body it got. 

But these bodies do not have large symmetric difference (can be as small 
as a 0(l/(n2")) fraction of the cross polytope if the two bodies differ in just 
one peak) but we can pick a subfamily of them having pairwise symmetric 
difference at least Q(l/n) by picking a large random subfamily. We will do it 
slightly differently which will be more convenient for the proof: Bodies in V 
have one-to-one correspondence with binary strings of length 2" : each facet 
corresponds to a coordinate of the string which takes value 1 if that facet 
has a peak attached, else it has value 0. To ensure that any two bodies in 
our family differ in many peaks it suffices to ensure that their corresponding 
strings have large Hamming distance. Large sets of such strings are of course 
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furnished by good error correcting codes. 

From this family we can obtain another family of almost disjoint bodies 
by taking products, while preserving the property that polynomially many 
random samples do not tell the bodies apart. This product trick (also known 
as tensoring) has been used many times before, in particular for amplifying 
hardness, but we are not aware of its use in a setting similar to ours. Our 
construction of the product family also resembles the operation of concate- 
nation in coding theory. 

Acknowledgments. We are grateful to Adam Kalai and Santosh Vem- 
pala for useful discussions. 

2 Preliminaries 

Let K,L CI M" be bounded and measurable. We define a distance dist(i^', L) 
as the total variation distance between the uniform distributions in K and 
L, that is. 



We will use |^| to denote the volume of sets A C M*^, and also to denote 
the cardinality of finite sets A; which one is meant in a particular case will 
be clear from the context. 

Let 1 denote the vector (1, . . . , 1). "log" denotes logarithm with base 2. 

We will need some basic definitions and facts from coding theory; see, 
e.g., For a finite alphabet S, and word length n, a code C is a subset 
of S". For any two codewords x,y €z C, distance dist(x,y) between them is 
defined by dist(x,y) := \{i G [n] : Xi ^ yi}\- The relative minimum distance 
for code C is miiix^y^cx^y dist{x,y)/n. For S = {0,1}, the weight of a 
codeword x is \{i G [n] : Xi / 0}|. Define Vg(n, r) := X]I=o '^^^ 
following is well-known and easy to prove: 

Theorem 2 (Gilbert- Varshamov). For alphabet size q, code length n, and 
minimum distance d, there exists a code of size at least q'^/Vq{n,d — 1). 

When the alphabet is S = {0, 1}, we define the complement c of a 
codeword c G C as Cj := 1 — q. 

3 A hard to learn family of convex bodies 

The construction proceeds in two steps. In the first step we construct a large 
subfamily of V such that the relative pairwise symmetric difference between 
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the bodies is ^(l/n). This symmetric difference is however not sufficiently 
large for our lower bound. The second step of the construction amplifies the 
symmetric difference by considering products of the bodies from the first 
family. 

3.1 The inner family: Cross-polytope with peaks 

We first construct a family with slightly weaker properties. The family 
consists of what we call "cross polytope with peaks". The n-dimcnsional 
cross polytope 0„ is the convex hull of the 2n points {ibej,i G [n]}. Let 
F be a facet of 0„, and let cp be the center of F. The peak associated 
to F is the convex hull of F and the point acp, where a > 1 is a positive 
scalar defined as follows: a is picked as large as possible so that the union 
of the cross polytope and all 2" peaks is a convex body. A cross polytope 
with peaks will then be the union of the cross polytope and any subfamily of 
the 2" possible peaks. The set of all 2^" bodies of this type will be denoted 
V. By fixing of an ordering of the facets of the cross polytope, there is a 
one-to-one correspondence between the cross polytope with peaks and 0-1 
vectors with 2" coordinates. 

Let P denote the cross polytope with all 2" peaks. We will initially 
choose a as large as possible so that the following condition — necessary 
for convexity of P hwt not clearly STifficicnt — is satisfied: for every pair of 
adjacent facets F, G of On, the vertex of each peak is in the following half- 
space: the halfspace containing On and whose boundary is the hyperplane 
orthogonal to the (vector connecting the origin to the) center of F Pi G, and 
containing Fr\G. A straightforward computation shows that a = n/{n— 1) 
for this condition. This implies by another easy computation that the vol- 
ume of all the peaks is |0„|/(n — 1). We will now show that this weaker 
condition on a is actually sufficient for the convexity of P and any cross 
polytope with peaks. 

Proposition 3. Every set in V is convex. 

Proof. Let Q be the intersection of all halfspaces of the form 

{x G : a • X < 1} 

where a G is a vector having entries in { — 1,0, 1} and exactly one zero 
entry. Equivalcntly, the boundary of each such halfspace is a hyperplane or- 
thogonal to the center of some (n — 2)-dimensional face of 0„ and containing 
that face. 
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In the rest of the proof we wih show that P = Q, which gives the 
convexity of P. This equahty imphes that a cross polytope with only some 
peaks is also convex: any such body can be obtained from P by intersecting 
P with the hafspaces induced by the facets of On associated to the missing 
peaks, and it is easy to see from the definition of the peaks that each such 
intersection removes exactly one peak. 

It is clear that P C Q. For the other inclusion, let x ^ Q. By symmetry 
we can assume x > 0. If < 1, then x £ On P- If X^^i > 1) 

will show that x is in the peak of the positive orthant. We would like to 
write a; as a convex combination of ei, . . . , e„ and the extra vertex of the 
peak, V = l/{n — 1). Let ^ = [n — l){(YliXi) — 1) > 0. We want a vector 
A = (Ai, . . . , A„) such that x is a convex combination of the vertices of the 
peak: 

X = + ^ AjCj = /if + A, 
that is, A = X + 1 — 1 ^ Xj. It satisfies 

/u + Ai = 1 

and Aj = Xj + 1 — ^ Xj, and this is non-negative: By definition of Q we have 
for all j G [n] 

^Xj < 1 + Xj. 

This shows that x belongs to the peak in the positive orthant. □ 

For notational convenience we let := 2"". Recall that we identify 
bodies in V with binary strings in {0, 1}^. Let C C {0, 1}^ be a code with 
relative minimum distance at least 1/4. To simplify computations involving 
distance "dist" between bodies, it will be convenient to have the property 
that all codewords in C have weight N/2. We can ensure this easily as 
follows. Let C C {0, 1}^/^ be a code with relative minimum distance at 
least 1/4, then set C := {(c,c) : ceC}. Clearly |C| = \C\. By Theorem[2] 
we can choose C such that \C\ > 2^^^^, for a positive constant ci. We fix C 
to be such a code, i.e. a code with relative minimum distance at least 1/4, 
size 2'^!^, and all codewords with weight N/2. 

We define the family Vc as the family consisting of bodies in V corre- 
sponding to codewords in C. As all codewords in C have the same weight, 
we have that all bodies in Vc have the same volume. Recall that the volume 
of each peak is 2"^-i) • Therefore for distinct P,Q £ Vc the volume of the 

symmetric difference of P and Q is at least .P"L . 
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3.2 The outer family: The product construction 

Let C be a code with codewords of length k and minimum distance at least 
k/2 on the alphabet Vc- That is, codewords in C can be represented as 
{Bi,...,Bk), where Bi G Vc for i = The product family 

corresponding to code C", has \C'\ bodies in R^", one for each codeword. 
The body for codeword {Bi, . . . , B/^) G C is simply Bi x . . . x Bj^. 

Clearly \Vq \ = \C'\. Using Theorem [2] we can choose C such that 
\C'\ > q^/Vq{k,k/2). Now note that 

F,(^,^/2) = f:Q(,-ir<(,-i)^v2gQ 

< 2^{q-lf''^ < (4g)'=/2. 

Therefore q^/Vg{k,k/2) > q^/{4qf/^ = (q/A)^/^. Setting q = 2"^^, 
we get \C'\ > 2('=i^~2)''/2 > 2"'^''^, for constant C2 > 0, assuming N is 
sufficiently large. We just showed: 

Lemma 4. > 2"^'''^". 

The following lemma shows that the bodies in Vq are almost pairwise 
disjoint. 

Lemma 5. For distinct A,Bg V^' we have 

dist(^,B) = 1 - > 1 - e-'^/^ie"). 

\A\ 

Proof. We constructed V^' so that all bodies in it have the same volume. 
This implies 

dist{A,B) = 1 - —^J^- 
Let A = Ai X . . . X and B = Bi x . . . x B^. Then 

|Ans| \Air\ Bi\ X . . . X \Akn Bk\ 



\A\ \Ai\ X ... X \Ak\ 

Since the minimum relative distance in C is at least 1/4 and the weight 
of each codeword is N/2, we have that for Ai ^ Bi the number of peaks in 
Ai n Bi is at most 2*" • 3/8. Hence 

l^^n^^l ^ l + 3/(8(n-l)) 
\Ai\ - l + l/(2(n-l))' 
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Since the minimum distance of C is at least k/2, we have Ai ^ Bi for at 
least k/2 values of i in [k]. Therefore we get 

l^n^l ^ / l+3/(8(n-l)) y/^ 
^4r- Vl + l/(2(n-l))J 

<(!- — ) <e-'=/(i6«). 

□ 



4 Proof of the lower bound 

Proof of Theorem [H We will make use of the family that we constructed 
in the previous section. Recall that the bodies in this family live in M"^, for 
d := kn. For this proof we will think of d as fixed and we will choose n 
appropriately for the lower bound proof. By a straightforward but tedious 
argument it is enough to prove the theorem assuming that d is a power of 
2. 

We will use Yao's principle (see, e.g., [9]). To this end, we will first show 
that the interaction between an algorithm and the oracles can be assumed 
to be "discrete", which in turn will imply that effectively there is only a 
finite number of deterministic algorithms that make at most q queries. The 
discretization of the oracles also serves a second purpose: that we can see 
deterministic algorithms as finite decision trees and use counting arguments 
to show a lower bound on the query complexity. 

Fix a body K from "P^ . Suppose that a randomized algorithm has 
access to the following discretizations of the oracles: 

• A discrete random oracle that generates a random point X = (Xi, . . . , X^) 
from K = Y\ - Ki and, for each i G [k\ outputs whether Xi lies in the 
corresponding cross-polytope or in which peak it lies. 

• A discrete membership oracle that when given a sequence of indices 
of peaks / = (zi, . . . outputs, for each z G [/c], whether peak i is 
present in K^. 

Claim: A randomized algorithm with access to discrete versions of the 
oracles can simulate a randomized algorithm with access to continuous ora- 
cles with the same number of queries. 

Proof of claim: We will show it for bodies in V, i.e. cross polytopes 
with peaks; the generalization of this argument to product bodies in is 
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straightforward. Let A and B be the algorithm with access to the continuous 
and discrete oracles respectively. Algorithm B acts as A, except when A 
invokes an oracle, where B will do as follows: When A makes a query p to 
the continuous membership oracle, B will query the peak that contains p 
(we can assume that p lies in a peak, as otherwise the query provided no 
new information). Now suppose that A makes a query to the continuous 
random oracle and gets a point p. Then B makes a query to the discrete 
random oracle. B then generates a uniformly random point p' in the region 
that it got from the oracle. Clearly p' has the same distribution as p, namely 
uniform distribution on the body. 

If we see deterministic algorithms as decision trees, it is clear that there 
are only a finite number of deterministic algorithms that make at most q 
queries to the discrete oracles of K. Thus, by Yao's principle, for any dis- 
tribution T> on inputs, the probability of error of any randomized algorithm 
against P is at least the probability of error of the best deterministic algo- 
rithm against P. 

Our hard input distribution V is the uniform distribution over . 
Now, in the decision tree associated to a deterministic algorithm, each node 
associated to a membership query has two children (either the query point 
is in the body or not), while a node associated to a random sample has at 
most (2" + 1)*^ children (the random sample can lie in one of the 2" peaks or 
in On, for each factor in the product body). Thus, if the algorithm makes at 
most q queries in total, then the decision tree has at most (2" + 1)'^'^ leaves. 
These leaves induce a partition of the family of inputs . By Lemma [5l 
the distance between any pair of bodies is at least 1 — e~^/^^" = 1 — e"'^/^^"^ , 
where n is chosen so that the output of the algorithm can be within e of at 
most one body in each part of the partition. That is. 



As d = kn is & power of 2, we can satisfy the previous inequality and the 
integrality constraints of k and n by using our assumption that 8/d < e < 
1/8 and letting n be a power of 2 such that 



2e < 1 



e 



which implies that we should take 





9 



By Lemma m the total number of bodies is 

\VS'\> 2"'^'". 
This imphes that the probabihty of error is at least 

(2^^ + (2" + 

If we want this error to be less than a given 6, then for some C3, C4 > we 
need 



Q > C3 + — 

n 



log(l - 


5) 


kn 




log(l- 


S) 


d 




log(l- 




d 



e „a/ioe1i 



For (5 = 1/2 and e < 1/4 this implies 



□ 



5 Discussion 

Informally, our construction of V^' can be thought of as "codes" in M"^, 
namely sets in R*^ that are far from each other; the difficulty in the con- 
struction of such codes comes from the requirements of convexity and that 
the distributions of polynomially many random samples look alike. By using 
slightly more involved arguments we can handle e arbitrarily close to 1 and 
prove a similar lower bound. It is not clear if such a lower bound is possible 
for other learning settings that have been studied in the past, e.g. labeled 
samples from Gaussian distribution. Unlike that setting, we do not know a 
matching upper bound for learning convex bodies in our model. 

Our construction of the hard family is more "explicit" than that of [8|: 
The hard family they construct is obtained by a probabilistic argument; our 
construction can be made explicit by using good error correcting codes. 
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We mention here a somewhat surprising corollary of our result, without 
detailed proof or precise numerical constants. Informally, it shows instability 
of the reconstruction of a convex body as a function of the volume of its 
intersection with halfspaces, relative to its volume. It is an exercise to see 
that knowledge of n -ff |/|-fC| for every halfspace H uniquely determines 
K. Moreover, given (for some fixed constant c > 0) random samples from 
a convex body K C R'^, with high probability we can estimate \K n H\/\K\ 
for all half-spaces H within additive error of 0(1/^^^ ), where c' is a positive 
constant depending on c. This can be proved using standard arguments 
about e-approximations and the fact that the VC-dimension of halfspaces in 
M"^ is (i + 1. 

We say that two convex bodies K and L are a-halfspace-far if there is a 
halfspace H such that \\K n H\/\K\ - \L n H\/\L\\ > a. Thus if we choose 
some t < c' and K and L are l/(i*-halfspace-far, then we can detect this 
using d*^ random points, with high probability. Now, we claim that there 
is a pair of bodies in Vq that is not far. For otherwise, all pairs would be 
far and we would be able to distinguish every body in from every other 
body in Vq with a sample of size d*^, and thus learn it. But as we have 
proved, this is impossible. So we can conclude that there are two bodies in 
that are not l/(i*-halfspace-far, i.e. they are l/(i*-halfspace-close. This 
gives: 

Corollary 6. For any constant t > and sufficiently large d there exist two 
convex bodies K, L Q M.'^ such that K,L are 1/d^ -halfspace- close: for every 
halfspace H 

\KnH\ _ \LnH 

\K\ \L\~ 

but dist{K,L) > 1/8. 

An earlier version of this manuscript mentioned the problem of whether 
the volume of a convex body in can be estimated from poly{d) uniformly 
random samples. Very recently, Ronen Eldan p] has answered this in the 
negative. His result provides a probabilistic construction of a family of 
convex bodies such that the volume a random body from this family is hard 
to estimate from random samples. His result does not supersede ours in the 
sense that our lower bound of 2^^^-' is stronger, and perhaps optimal, and 
our construction of the hard family is explicit. 

It is known that if the convex body is a polytope with poly(d) facets, 
then it can be learned with poly((i) uniformly random samples |6] in an 
information-theoretical sense. However, whether this can be done efficiently 
remains open: 



< 
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Problem. Can one learn polytopes with poly(d) facets from poly(d) uni- 
formly random (over the polytope) samples in poly(d) time? 
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